Skip to content

GH-49905: [Archery] Fix archery benchmark diff with pandas 3#49912

Merged
pitrou merged 1 commit intoapache:mainfrom
AntoinePrv:archery-pandas-3
May 6, 2026
Merged

GH-49905: [Archery] Fix archery benchmark diff with pandas 3#49912
pitrou merged 1 commit intoapache:mainfrom
AntoinePrv:archery-pandas-3

Conversation

@AntoinePrv
Copy link
Copy Markdown
Contributor

@AntoinePrv AntoinePrv commented May 4, 2026

Rationale for this change

Make archery benchmark diff work with pandas >= 3.0

What changes are included in this PR?

Another way of patching the module

Are these changes tested?

Yes, locally running archery benchmark diff with both pandas 2 and 3

Are there any user-facing changes?

No

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

⚠️ GitHub issue #49905 has been automatically assigned in GitHub to PR creator.

@github-actions github-actions Bot added the awaiting review Awaiting review label May 4, 2026
@pitrou
Copy link
Copy Markdown
Member

pitrou commented May 4, 2026

Thanks a lot @AntoinePrv .

@jorisvandenbossche Could you please take a quick look at this? (also is there a way to make Pandas behavior more predictable here?)

@jorisvandenbossche
Copy link
Copy Markdown
Member

Hmm, that's annoying.

I don't directly understand why this is failing, though (but I can reproduce it by setting sys.modules['pyarrow'] = None first and then important pandas locally). In pandas, in cython code, we essentially do this:

cdef bint PYARROW_INSTALLED = False

try:
    import pyarrow as pa

    PYARROW_INSTALLED = True
except ImportError:
    pa = None

Trying to import pyarrow after sys.modules['pyarrow'] = None does raise a ModuleNotFoundError, which is a subclass of ImportError, and so I think should be correctly caught by the block above?
Unless that code does not behave exactly as expected when in cython.

@raulcd
Copy link
Copy Markdown
Member

raulcd commented May 5, 2026

@jorisvandenbossche I can't reproduce this with Pandas 3:

>>> import sys
>>> sys.modules['pyarrow']=None
>>> import pandas
>>> pandas.__version__
'3.0.2'
>>> import pyarrow
Traceback (most recent call last):
  File "<python-input-9>", line 1, in <module>
    import pyarrow
ModuleNotFoundError: import of pyarrow halted; None in sys.modules

@jorisvandenbossche
Copy link
Copy Markdown
Member

@raulcd it does not yet happen on import, but on first usage that checks input for being a pyarrow array, such as pd.DataFrame({"col": [1, 2, 3]})

Essentially, I think this boils down to python imports in cython working a bit differently. In pure python, import pyarrow raises a ModuleNotFoundError if you set the module to None in sys.modules. But in cython, it actually "finds" the module (it doesn't raise an error) but sets variable for the imported module to None (i.e. the value that is set in sys.modules).

We could specifically check for that case on the pandas side, although that feels a bit of a hack. We could also move that try/except to python code, so it works more as expected.

@AntoinePrv
Copy link
Copy Markdown
Contributor Author

@jorisvandenbossche what about the proposed fix of this PR? It does solve the issue.

@jorisvandenbossche
Copy link
Copy Markdown
Member

Ah, yes, I am certainly fine with that workaround to have archery working (it just should not be necessary ..)

Copy link
Copy Markdown
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @AntoinePrv !

@pitrou pitrou merged commit 61c96ca into apache:main May 6, 2026
28 checks passed
@pitrou pitrou removed the awaiting review Awaiting review label May 6, 2026
@github-actions github-actions Bot added the awaiting committer review Awaiting committer review label May 6, 2026
@AntoinePrv AntoinePrv deleted the archery-pandas-3 branch May 6, 2026 13:25
@conbench-apache-arrow
Copy link
Copy Markdown

After merging your PR, Conbench analyzed the 0 benchmarking runs that have been run so far on merge-commit 61c96ca.

None of the specified runs were found on the Conbench server.

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting committer review Awaiting committer review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants